Round 1: Technical - 1
Spark (PySpark)
🔹 Word Count Problem
1. Modify the code to output the word count such that word frequency is in descending order.
2. Why is reduceByKey used instead of groupByKey?
🔹 What is lineage in Spark?
🔹 Difference between cache and persist in Spark.
🔹 Is fault tolerance the same in Spark and Hadoop?
SQL
🔹 Explain query execution order.
🔹 What are the different types of joins in SQL?
🔹 Explain the difference between DENSE_RANK and RANK.
🔹 What is a cursor in SQL?
🔹 What is a stored procedure in SQL?
Python
🔹 What is a docstring in Python?
🔹 What is pass in Python? When is it used?
🔹 Which data structure occupies more memory: list or tuple? Why?
🔹 Python code to count the frequency of characters in a given text file.
🔹 Python code to create a palindrome with a given number of alphabets.
Example: For n=3 (alphabets: a, b, c) → Palindrome: abcba.
Round 2: Technical - 2
AWS
🔹 What is the Data Catalog in AWS Glue?
🔹 Difference between Athena and Aurora.
🔹 What is versioning in S3?
🔹 What are the different data distribution styles in Redshift?
Projects
🔹 Explain the problem statement of your projects and walk through the implementation details.
Round 3: Managerial
🔹 Describe your past experiences.
🔹 Answer scenario-based questions related to your projects or work environment.
Round 4: HR
🔹 Why are you looking for a change?
🔹 Salary negotiation.
🔹 Overview of the company's operations and the types of projects it undertakes.